class: center, middle, inverse, title-slide # Thompsons Creek Watershed ## Flow Estimation Methodologies ### Texas Water Resources Institute | Texas A&M AgriLife Research ### 2021-11-23 --- # Project Drivers - DAR works well under most circumstances for predicting the FDC (Ries and Friesz, 2000; Asquith, Roussel, and Vrabel, 2006); -- - Rarely more than a handful of instantaneous streamflow measurements to "validate" data (mean daily streamflow ≠ instantaneous streamflow); -- - We also often want daily streamflow estimates, this works best with a gage in the same watershed where daily peaks are more likely to correspond; --- # Project Drivers - SWAT is well established and accepted, especially in rural watersheds (many new extensions to incorporate groundwater, stormwater networks, etc.) (Arnold and Fohrer, 2005); -- - Issue of validation in ungaged watersheds still exists; -- - Often calibrated to downstream gages on mainstem reaches that may or may not be reflective of the subwatershed of interest; --- # Project Drivers - Can we efficiently provide mean daily streamflow measurements to validate streamflow estimation methods/models? -- - DAR is a desirable approach: simple, reproducible, acceptable; -- - If DAR doesn't perform well do we have options to estimate streamflow other than the numerous numeric models that require substantial overhead and are somewhat difficult to interpret? --- - Physical (numeric) models: lumped hydrologic models, distributed hydrologic models etc. - varying degrees of interpretibility, high data requirements, great for forecasting or predicting outside of calibration data range -- - Statistical methods: Linear regression, semi-parametric regression, etc. - higher interpretibility, lower data requirements, not always great for forecasting -- - Machine learning (statistical) methods: ANN, regression trees, ... - low interpretibility, lower data requirements (?), popular in forecasting and prediction --- # Methods -- ## 1: Develop mean daily streamflow period of record; -- ## 2: Evaluate DAR, linear regression, and semi-parametric regression for estimating streamflow records at site of interest. --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record .pull-left[ <img src="data:image/png;base64,#images/HOBO-U20L-01.jpg" width="90%" /> ] .pull-right[ - Record 15-minute stream depths using HOBO Water Level Logger - pressure transducer deployed instream and separate pressure transducers deployed to measure atmospheric pressure. ] --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record .pull-left[ <img src="data:image/png;base64,#images/labelled-IQ-Web-only.jpg" width="90%" /> ] .pull-right[ - Measure 15-minute streamflows using Sontek IQ Plus (Bottom mount acoustic doppler velocity meter); - Periodic deployments, utilizes proprietary index velocity method to calculate streamflow over the chosen interval. ] --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record .pull-left[ <img src="data:image/png;base64,#images/rating-curve-example.png" width="90%" /> ] .pull-right[ - Develop rating curves relating flow and depth over the course of the year - Power function:
Q
=
K
(
H
−
H
0
)
z
Q = K(H-H_0)^z
Q
=
K
(
H
−
H
0
)
z
- parameterize
K
K
K
,
H
0
H_0
H
0
, and
z
z
z
using non-linear least square (minimize SSE). ] --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record .pull-left[ Unsteady flow: <img src="data:image/png;base64,#images/petersen-example.png" width="75%" /> Figure from Petersen-Øverleir (2006) ] .pull-right[ - For unsteady flows use the modified Jones Formula:
Q
=
K
(
h
−
a
)
n
×
1
+
x
∂
h
∂
t
Q = K(h-a)^n\times\sqrt{1 + x\frac{\partial h}{\partial t}}
Q
=
K
(
h
−
a
)
n
×
1
+
x
∂
t
∂
h
- parameterize
K
K
K
,
a
a
a
,
n
n
n
,
x
x
x
; -
∂
h
∂
t
\frac{\partial h}{\partial t}
∂
t
∂
h
is the "rate of change" in stream height as a given time. (Petersen-Øverleir, 2006; Zakwan, 2018) ] ??? Looped rating curves (aka hysteresis) are a result of unsteady flow effects, shifts in the channel bed, flat slopes with backwater effects, tidal effects, --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record .pull-left[ - Fit one or more rating curves to the data based on visual inspection of the time series record; - Changes in channel shape, vegetation, etc. will alter the rating curve and can necessitate updating the curve throughout the year; ] .pull-right[ <img src="data:image/png;base64,#images/rating-curve-shift.png" width="90%" /> ] --- # Methods ## Develop 1-Yr Mean Daily Streamflow Period-of-Record - Use the rating curve to calculate flows from the HOBO measured depths; -- - 15-minute streamflow record is aggregated to mean daily streamflow. --- # Methods ## Evaluate methods for estimating daily streamflows - Information transfer methods -- - Statistical or algebraic transfer of runoff data from one watershed to another -- - DAR
Q
y
t
=
Q
x
t
(
A
y
A
x
)
ϕ
Q_y^t = Q_x^t\bigg(\frac{A_y}{A_x}\bigg)^\phi
Q
y
t
=
Q
x
t
(
A
x
A
y
)
ϕ
-- - Linear regression between gaged site and ungaged site
Q
y
=
β
0
+
β
1
X
1
+
ε
Q_y = \beta_0 + \beta_1X_1 + \varepsilon
Q
y
=
β
0
+
β
1
X
1
+
ε
--- # Methods ## Evaluate methods for estimating daily streamflows - Information transfer methods - Assumes similar precipitation and rainfall-runoff characteristics -- - Generally performs well for FDC estimation, but estimating daily streamflow values require a well-chosen gaged site. --- # Methods ## Evaluate methods for estimating daily streamflows - Semi-parametric rainfall-runoff regression - Generalized Additive Model (GAM) is an extension of the Generalized Linear Model (GLM) that incorporates nonlinear forms of the predictor variables. The advantage of generalized models over linear regression models is the ability to incorporate different distributions in the error structure and a flexible *link function* that relates the mean of the response to the predictor variables. We know stream flow is restricted to >= 0 and has a extremely skewed distribution. By selecting the appropriate family and link structure we can restrict the response to the positive space. - GAMs also allow nonlinear relationships between the predictor and response variables.
Q
y
=
β
0
+
f
(
x
1
)
+
ε
Q_y = \beta_0 + f(x_1) + \varepsilon
Q
y
=
β
0
+
f
(
x
1
)
+
ε
- **WHAT?!** --- **GAMs fit wobbly lines (smooth functions) to the data** <img src="data:image/png;base64,#images/gam-crs-animation.gif" width="90%" /> ??? GAMs use splines to represent the non-linear relationships between covariates, here `x`, and the response variable on the `y` axis. Splines are built up from basis functions Here I'm showing a cubic regression spline basis with 10 knots/functions We weight each basis function to get a spline. Here all the basis functions have the same weight so they would fit a horizontal line But if we choose different weights we get more wiggly spline Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints --- **Too wobbly or not wobbly enough?** <img src="data:image/png;base64,#schramm-thompsons-creek-2021_files/figure-html/unnamed-chunk-1-1.png" width="90%" /> ??? Here we show the smooth function in a GAM constrained to different levels of smoothing. The dotted line is the underlying true function (a sine curve) and the points are randomly generated by the function but with a normally distributted random error. The first plot predicts the data best, but is harder to explain and has a higher variance from the underlying sine function that describes the true data. As we smooth the curve, the variance decreases but our in-sample error increases. GAMs assist with finding the balance between overfitting the data and creating a curve that is too smooth and misses important patterns in the underlying data. --- .left-column[ ### Nutrient Load Prediction ] .right_column[ .center[ <img src="data:image/png;base64,#images/hagemann-example.png" width="50%" /> (Hagemann, Kim, and Park, 2016) ] ] ??? This was probably the first published application I came across using GAMs in a water quality context. This study used GAMs to model daily nutrient concentrations for a stream that fed a drinking water reservoir. --- .left-column[ ### Nutrient Load Prediction ### Chesapeake Bay Program ] .right_column[ .center[ <img src="data:image/png;base64,#images/murphy-example.png" width="50%" /> (Murphy, Perry, Harcum, and Keisman, 2019) ] ] ??? The Chesapeake Bay Program has been using GAMs to model and assess nutrient concentrations in the Bay as part of the Bay TMDL. This figure shows modeled concenctration, modeled average concentrations, and flow adjustments used to compare concentrations between years. --- .left-column[ ### Nutrient Load Prediction ### Chesapeake Bay Program ### Discharge/Velocity Prediction ] .right_column[ .center[ <img src="data:image/png;base64,#images/asquith-example.png" width="50%" /> (Asquith, Herrmann, and Cleveland, 2013) ] ] ??? USGS study to predict the mean discharge and velocity of streams during direct runoff conditions (~95% exceedance). It is difficult to see on these maps, but they are showing lower stream velocities in eastern streams compared to western streams. Probably attributable to vegetation and stream morphology. --- .left-column[ ### Nutrient Load Prediction ### Chesapeake Bay Program ### Discharge/Velocity Prediction ### Environmental Mitigation ] .right_column[ .center[ <img src="data:image/png;base64,#images/Schramm-example.png" width="30%" /> (Schramm, Bevelhimer, and Scherelis, 2017) ] ] ??? Before I joined TWRI, I worked on the hydropower program for DOE at Oak Ridge National Lab. We did studies on environmental mitigation and planning for hydropower production. This study looked at how fish respond to hydrokinetic turbine noise. We used GAMs to assess and predict if fish are avoid or are attracted to the sound produced by turbines. GAMs were essential for modelling this non-linear response. --- .left-column[ ### Nutrient Load Prediction ### Chesapeake Bay Program ### Discharge/Velocity Prediction ### Environmental Mitigation ### Upper Llano Watershed ] .right_column[ .center[ <img src="data:image/png;base64,#images/Schramm-Llano.png" width="30%" /> (Schramm, Broad, and Arsuffi, 2018) ] ] ??? Another water quality example, we used GAMs to model and describe trends in various water quality parameters in the Upper Llano watershed. Here we show the non-linear response of E.coli concentrations to flow over time and by season. We were able to show seasonal decreases under certain conditions. --- ## GAM
Q
=
f
(
P
)
+
f
(
T
)
+
f
(
P
l
a
g
,
1
)
+
f
(
P
s
u
m
,
3
)
+
f
(
T
m
e
a
n
,
5
)
+
f
(
H
)
+
f
(
M
)
Q = f(P) + f(T) + f(P_{lag,1}) + f(P_{sum,3}) + f(T_{mean,5}) + f(H) + f(M)
Q
=
f
(
P
)
+
f
(
T
)
+
f
(
P
l
a
g
,
1
)
+
f
(
P
s
u
m
,
3
)
+
f
(
T
m
e
an
,
5
)
+
f
(
H
)
+
f
(
M
)
-
f
(
)
=
f() =
f
(
)
=
some unknown smoothing function -
P
=
P =
P
=
log(Precipitation + 1) -
T
=
T =
T
=
squared max temp -
P
l
a
g
,
1
=
P_{lag,1} =
P
l
a
g
,
1
=
1 day lag P -
P
s
u
m
,
3
=
P_{sum,3} =
P
s
u
m
,
3
=
3 day sum rainfall -
T
m
e
a
n
,
5
=
T_{mean,5} =
T
m
e
an
,
5
=
5 day mean T_max_ -
H
=
H =
H
=
Relative Humidity -
M
=
M =
M
=
Month --- # Results --- .pull-left[ **Rating Curve: 16396 Thompsons @ Silver Hill Rd** <img src="data:image/png;base64,#images/03-ratingcurve-16396.png" width="90%" /> ] .pull-right[ **Rating Curve Parameters & Fit** | K | a | n | x | NSE | nRMSE | | - | - | - | - | --- | ----- | | 4.8077| 0.366 | 0.4816 | -0.1808 | 0.99 | 2.5 | | 1.5574| -0.696| 1.3786 | -0.0786 | 0.73 | 6.8| | 4.3915| 0.1785| 0.6552 | 0.0808 | 0.97 | 1.8 | ] --- .pull-left[ ## 15-minute Streamflow <img src="data:image/png;base64,#images/05-15-minute-streamflow.png" width="95%" /> ] .pull-right[ ## Naturalized Hydrograph <img src="data:image/png;base64,#images/naturalized-hydrograph.png" width="70%" /> <small>WWTF influences removed</small> ] --- ## DAR results (Thompsons @ Silver Hill Rd.) .pull-left[ <img src="data:image/png;base64,#images/dar-16396.png" width="95%" /> ] .pull-right[ | Method | NSE | KGE | |--------|-----|-----| | DAR 08065800 | -0.27 | -0.08 | | DAR 08109800 | 0.25 | -0.36 | | DAR 08110100 | 0.26 | -0.22 | ] --- ## Linear Regression Results .pull-left[ <img src="data:image/png;base64,#images/linear-regression.png" width="95%" /> ] .pull-right[ | Method | NSE | KGE | |--------|-----|-----| | Linear Regression | 0.52 | 0.21 | ] --- ## GAM Results .pull-left[ <img src="data:image/png;base64,#images/gam.png" width="95%" /> ] .pull-right[ | Method | NSE | KGE | |--------|-----|-----| | GAM | 0.425 | 0.46| ] --- ## Metrics ### Site 16396 | Method | NSE | KGE | |--------|-----|-----| | DAR 08065800 | -0.27 | -0.08 | | DAR 08109800 | 0.25 | -0.36 | | DAR 08110100 | 0.26 | -0.22 | | Linear Regression | 0.52 | 0.21 | | GAM | 0.425 | 0.46| --- ## Cross-validation We want to evaluate how this approach works on data outside of the data we fit the models to. Normally, we hold out a portion of data and use it as a test data set. However, we don't have much data. We use Monte-Carlo Cross Validation: <img src="data:image/png;base64,#images/mccv.png" width="75%" /> --- .pull-left[ **Linear regression** <img src="data:image/png;base64,#images/cross-validation_lm.png" width="80%" /> ] .pull-right[ **GAM** <img src="data:image/png;base64,#images/cross-validation.png" width="80%" /> ] --- ## Flow Duration Curves .pull-left[ <img src="data:image/png;base64,#images/fdc_16396.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#images/predicted_fdc_16396.png" width="100%" /> ] --- ## To Do: - Revisit rating curves. Over prediction of high flow events might be an issue. - Explore co-variates (PET) and effects of normalizing streamflows before fitting regression based models - Compare precipitation driven GAMS to rainfall runoff approaches (HyMOD, SCS-CN, TWDB Rainfall-Runoff model, etc.) - Explore period of record needed to confidently predict to out of sample data. --- ## Some lessons: - Site selection is difficult for deploying bottom mount dopplers - More frequent data, possibly longer (>1 year) sampling would be useful (need to quantify tradeoffs between data collection costs and developing distributed models like SWAT) - Flow data collected alongside routine data will avoid some of the challenges associated with recreating daily flow records, much easier to estimate flow exceedances and match flows instead of collection dates --- # References <small> Arnold, J. G. and N. Fohrer (2005). "SWAT2000: current capabilities and research opportunities in applied watershed modelling". In: _Hydrological Processes_ 19.3, pp. 563-572. ISSN: 0885-6087, 1099-1085. DOI: [10.1002/hyp.5611](https://doi.org/10.1002%2Fhyp.5611). URL: [http://doi.wiley.com/10.1002/hyp.5611](http://doi.wiley.com/10.1002/hyp.5611). Asquith, W. H., G. R. Herrmann, and T. G. Cleveland (2013). "Generalized Additive Regression Models of Discharge and Mean Velocity Associated with Direct-Runoff Conditions in Texas: Utility of the U.S. Geological Survey Discharge Measurement Database". In: _Journal of Hydrologic Engineering_ 18.10, pp. 1331-1348. ISSN: 1084-0699, 1943-5584. DOI: [10.1061/(ASCE)HE.1943-5584.0000635](https://doi.org/10.1061%2F%28ASCE%29HE.1943-5584.0000635). URL: [http://ascelibrary.org/doi/10.1061/ Asquith, W. H., M. C. Roussel, and J. Vrabel (2006). _Statewide Analysis of the Drainage-Area Ratio Method for 34 Streamflow Percentile Ranges in Texas_. Technical Report 2006-5286. U.S. Geological Survey. URL: [https://pubs.usgs.gov/sir/2006/5286/pdf/sir2006-5286.pdf](https://pubs.usgs.gov/sir/2006/5286/pdf/sir2006-5286.pdf). Hagemann, M., D. Kim, and M. H. Park (2016). "Estimating Nutrient and Organic Carbon Loads to Water-Supply Reservoir Using Semiparametric Models". In: _Journal of Environmental Engineering_ 142.8, p. 04016036. ISSN: 0733-9372, 1943-7870. DOI: [10.1061/(ASCE)EE.1943-7870.0001077](https://doi.org/10.1061%2F%28ASCE%29EE.1943-7870.0001077). URL: [http://ascelibrary.org/doi/10.1061/ Murphy, R. R., E. Perry, J. Harcum, et al. (2019). "A Generalized Additive Model approach to evaluating water quality: Chesapeake Bay case study". In: _Environmental Modelling & Software_ 118, pp. 1-13. ISSN: 13648152. DOI: [10.1016/j.envsoft.2019.03.027](https://doi.org/10.1016%2Fj.envsoft.2019.03.027). URL: [https://linkinghub.elsevier.com/retrieve/pii/S1364815218307801](https://linkinghub.elsevier.com/retrieve/pii/S1364815218307801). Petersen-Øverleir, A. (2006). "Modelling stage—discharge relationships affected by hysteresis using the Jones formula and nonlinear regression". In: _Hydrological Sciences Journal_ 51.3, pp. 365-388. DOI: [10.1623/hysj.51.3.365](https://doi.org/10.1623%2Fhysj.51.3.365). </small> --- # References (cont.) <small> Ries, K. G. and P. J. Friesz (2000). _Methods for Estimating Low-Flow Statistics for Massachusetts Streams_. Technical Report 00-4135. U.S. Geological Survey, p. 81. Schramm, M. P., M. Bevelhimer, and C. Scherelis (2017). "Effects of hydrokinetic turbine sound on the behavior of four species of fish within an experimental mesocosm". In: _Fisheries Research_ 190, pp. 1-14. ISSN: 01657836. DOI: [10.1016/j.fishres.2017.01.012](https://doi.org/10.1016%2Fj.fishres.2017.01.012). Schramm, M., T. Broad, and T. Arsuffi (2018). _Escherichia coli and Dissolved Oxygen Trends in the Upper Llano River Watershed, Texas (2001-2016)_. Technical Report TR-511. College Station, Texas: Texas Water Resources Institute, p. 27. URL: [https://twri.tamu.edu/media/1458/tr-511.pdf](https://twri.tamu.edu/media/1458/tr-511.pdf). Zakwan, M. (2018). "Spreadsheet-based modelling of hysteresis-affected curves". In: _Applied Water Science_ 8.4, p. 101. DOI: [10.1007/s13201-018-0745-3](https://doi.org/10.1007%2Fs13201-018-0745-3). </small> --- **Extra Slides** --- <img src="data:image/png;base64,#images/appendix_marginal_gam1.png" width="95%" /> --- **GAM Summary 16396** | Component | Term | Estimate | Std Error | t-value | p-value | |-----------|-------|-----------|-----------|---------|---------| |A. parametric coefficients | (Intercept) | 2.034 | 0.038 | 53.804 | \*\*\* | |Component | Term | edf | Ref. df | F-value | p-value | |B. smooth terms | s(ewood_precip) | 2.638 | 9.000 | 13.269 | \*\*\* | | | s(ewood_tmax) | 0.000 | 9.000 | 0.000 | | | | s(lagPrecip) | 0.001 | 9.000 | 0.000 | | | | s(wetness) |5.372 | 9.000 | 22.621 | \*\*\* | | | s(et) | 4.383 | 9.000 | 3.662 | \*\*\* | | | s(ewood_rh) | 0.000 | 9.000 | 0.000 | | | | s(month) | 5.959 | 8.000 | 6.926 | \*\*\* | Signif. codes: 0 <= '\*\*\*' < 0.001 < '\*\*' < 0.01 < '\*' < 0.05 < '.' < 0.1 < '' < 1 Adjusted R-squared: 0.304, Deviance explained 0.801 -REML : 1114.800, Scale est: 0.553, N: 387